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Abstract 

Manipulation of multimedia data is not straightforward as in conventional databases. One 
main problem is the retrieval of multimedia data from the database with the need to match the 
contents of multimedia data to a user query. In order to achieve a content based retrieval in 
our approach, we use natural language captions which allow the user to describe the contents 
of multimedia data. In a similar manner, users will specify their queries on multimedia data 
contents in natural language form. A problem is that different or even the same user describe 
the same thing differently at different times which results in the descriptions of the contents 
of multimedia data to rarely exactly match the descriptions of the user queries. Hence, partial 
or approximate match between descriptions of multimedia data and user queries is generally 
required during multimedia data retrieval. We propose an intelligent approach to approximate 
match by integrating both object-oriented and natural language understanding techniques. In 
order to make the query specification process easier we also develop a graphical user interface 
supporting incremental query specification and a natural way of expressing joins. The Multi- 
media Database Management System (MDBMS) described in this paper incorporates the ca- 
pabilities as mentioned above. 
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1. Introduction 

A multimedia database management system supports the management of multimedia data, 
which includes image and sound among others, in addition to supporting conventional databases. 
Multimedia systems are currently gaining a lot of attention because technology today has made it 
possible to capture and store multimedia data in computers. Multimedia data broadens the commu- 
nication between the computer system and the user. Many applications like military, publishing, or 
instructional routinely need multimedia data. Although the cost of the hardware required to handle 
multimedia data is decreasing rapidly, the software needed to manage such multimedia data is lack- 
ing or does not match the needs. 

In this paper we present a Multimedia Database Management System (MDBMS). The system 
allows sophisticated handling of multimedia data featuring an intelligent data retrieval as well as a 
graphical interface for user interaction. Besides describing the overall system architecture the im- 
portant parts of the system such as Parser, Matcher and Graphical User Interface will be presented 
in more detail. 

One important achievement of the MDBMS system is the efficient method for the retrieval of 
multimedia data by way of inexact matching. In conventional databases, retrieval of standard nu- 
merical and alphanumeric data is handled by utilizing the content of the data. The fundamental 
problem that one must face in the context of a multimedia database is the question of how to handle 
content search. There is no easy solution. It is difficult to find the appropriate data conveniendy and 
efficiently based on the contents of the multimedia data because they are intrinsically rich in se- 
mantics. In developing an efficient retrieval method for multimedia data, we concluded that it is 
not possible to utilize the content directly with today’s technology. This is a fair conclusion since 
the content of a multimedia data is mostly unstructured complex data like an image or a sound. 

In our MDBMS system we use the approach of content based search by means of verbal de- 
scriptions on the contents of multimedia data. We argue that the well known keyword approach to 
content description is not suitable because it has been known to be imprecise and the users often 
have difficulty in focusing the search to data of interest. Hence, we adopt the natural language ap- 
proach to content description as a more viable option. Since full understanding of natural language 
is not yet achievable, we use a caption based approach to express the description of media data. In 
order to achieve an automatic interpretation of captions we exploit techniques used in natural lan- 
guage understanding and artificial intelligence. 

The methodology we adopt consists of associating natural language captions to each multime- 
dia data and using the description to retrieve the relevant data. More precisely, the description of a 
multimedia data is matched against the description of a user query which is also expressed using 
natural language captions. The major problem with this approach is that it is generally the case that 
the description of a multimedia data does not exactly match the description of a user query. The 
reason is that it is difficult for different users or even the same user at different times to describe 
the same thing identically because they can use synonyms or generalize/specialize categories be- 
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longing to the domain of interest and so on. Hence, the key to efficient retrieval process is to auto- 
matically perform partial or approximate match of the description of a multimedia data to the 
description of a user query whenever exact match is not possible. In this paper, we propose an in- 
telligent approach to approximate matching by integrating object-oriented and natural language 
understanding techniques. 

The second issue addressed in this paper is new ways of interaction with the user. The user in- 
terface is an important part which strongly determines the effectiveness in using a system. In order 
to achieve a natural way of interacting with the MDBMS system we are developing a graphical user 
interface which makes the query specification easier compared to query languages like SQL. We 
found that in order to formulate complex queries a user partition it into smaller pieces and put them 
together in a later stage. This behavior is reflected in the principle of incremental query specifica- 
tion which is supported by our Graphical User Interface. In addition, we observed that, for a given 
database, the joins necessary to specify most of the queries correspond directly to natural language 
expressions. This leads to the principle of natural expression of joins also supported by our Graph- 
ical User Interface. Both principles are generally of use not only for multimedia systems or graph- 
ical user interfaces but for any database query interface. 

This paper makes three contributions. The first contribution is that context description of mul- 
timedia data is possible using natural language captions which can be interpreted automatically us- 
ing domain dependent knowledge. Another contribution is the formulation of a general scheme to 
retrieve data that comprises a variety of multimedia data stored in a database with special emphasis 
on approximate match. As far as we know, very little research on partial or approximate matching, 
especially in the natural language applications, has been conducted in natural language processing. 
The retrieval method may also be easily adopted into the field of intelligent information retrieval. 
Hereby we support the claim that object-oriented technology can be adopted and easily applied to 
multimedia systems application. The third contribution is the identification and application of two 
principles in the construction of a graphical user interface that help to make the query specification 
process easier. 

The paper is organized as follows: Section 2 discusses related work. Section 3 addresses fun- 
damental problems and outlines the architecture of the Multimedia Database Management System 
(MDBMS). Section 4 describes the natural language interpretation capabilities of the parser. Sec- 
tion 5 describes our approximate match algorithm used for the retrieval of multimedia data and 
Section 6 gives a short overview of the user interface. Finally, Section 7 gives the summary. 

2. Related Work 

Several multimedia projects have been undertaken by various researchers in both academia and 
industry over the past several years. The MINOS system [CHRIS 86] developed by a team at the 
University of Toronto manages highly structured multimedia objects that consist of attributes as 
well as the text, image and voice part. Sophisticated browsing and user interface features allow 
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browsing of the schema as well as synchronized updates. The MCC Database program 
[WOEL86,87] also undertook several multimedia projects by establishing the database require- 
ments of multimedia applications. They identified requirements for a data model and for the shar- 
ing and manipulation of multimedia data. Hypertext has also been extended to manage image and 
sound as well. One notable outcome is the INTERMEDIA system [YANK88] developed at Brown 
University. [MASU87] has developed a framework to classify and compare the different projects. 

The user interface is an important part of a database system especially when dealing with mul- 
timedia data because of their non-textual nature. Most of the research in the area of user interfaces 
focus on the entity-relationship [WONG82, FOGG84, ROGE88] or the more complex sematic and 
object-oriented data model [KING84, GOLD85, BRYC86, AGRA90] allowing queries to be di- 
rectly specified within the schema. In contrast we use an extension of the relational model to handle 
and manipulate the media data. In order to allow an easy query specification we provide a graphical 
user interface which incorporates incremental query specification and a natural way of expressing 
joins, differing in many ways from the well know OBE interface [ZL0077]. 

Another important aspect of a multimedia database system is the content retrieval of media 
data. The fundamental difficulty in the retrieval of multimedia data lies in the problem of handling 
the rich semantics that is contained in the data. In [LUM89], we introduced the approach of con- 
tents based search by means of natural language descriptions that form a part of a multimedia data. 
This approach is related to the research on artificial intelligence (AI) and information retrieval (IR). 
In the area of AI a variety of methods have been developed for the processing of natural language. 
Although the problem of full understanding of natural language has not yet been solved satisfacto- 
ry, powerful tools for parsing and interpretation of natural language have been developed. 
[GROSZ87] exemplifies the current state of the art. Most of the work done focus on complete un- 
derstanding of natural language requiring extensive knowledge bases with general world knowl- 
edge. Our approach is somewhat simpler. We are only dealing with a subset of natural language 
being broad enough to allow a natural description of the media data but easier to understand than 
the full, general language. Furthermore, we found that for most applications the knowledge base is 
domain specific allowing us to deal with a much smaller one for each domain. Both aspects con- 
tribute to an acceptable performance which is critical for a database system. 

In the domain of IR there had been early interest in using AI techniques [SPAR78, SMIT80]. 
The IRUS system [BATES 83] is more representative of modem attempts which is designed for 
processing heterogeneous data bases through natural language queries. The RUBRIC system 
[TONG87] is a production rule-based IR system in which the indexing base of the system contains 
positional information about words in the texts, which allow positional controls on words while 
processing queries. The I R system [CROF87] provides assistance to users at all stages of the re- 
trieval process and consists of a set of expert systems managed by a scheduler. Last but not least, 
the IOTA system [CHIA87] tries to improve the qualitative performance of IR systems in replacing 
keywords by noun groups involving extensive semantics. 
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The approach we propose is somewhat different from the intelligent IR systems mentioned. It 
is clear that most of the work in these systems is mainly concerned with natural language process- 
ing, particularly query processing, and deductive capabilities based on extended semantic model 
of document content and sometimes from the user. Our approach also shares these characteristics. 
However, the concept of matching function between system concepts and user concepts is based 
on exact matching in many systems while our approach is based on approximate matching. Even 
in systems with approximate matching capabilities, the matching function used are primitive or su- 
perficial at best compared to our approach which integrates object-oriented technology to natural 
language understanding to improve the quality of the matching process. 

3. Architecture of Multimedia Database Management System (MDBMS) 

In this section, we outline the architecture of the MDBMS. The architecture consists of the var- 
ious components of the MDBMS system. Before we continue, definitions and various issues asso- 
ciated with the data model used in the MDBMS system are addressed. 

3.1 Definitions and Background 

As mentioned before, multimedia data, in the broadest sense, consists of unformatted data such 
as text, image, voice, signals, etc. in addition to alphanumeric data. We define a multimedia data- 
base management system (MDBMS) as a system that manages all multimedia data and provide 
mechanisms to handle concurrency, consistency, and recovery in addition to providing a query lan- 
guage and query processing. 

Despite differences in data model and implementation aspects, all research projects on MD- 
BMS have decided to organize multimedia data using abstract data type (ADT) concept. This is 
generally accepted as the adequate approach. However, none of the projects have addressed the 
problem of content retrieval of multimedia data. 

The fundamental difficulty in handling multimedia data is intrinsically tied to a very rich se- 
mantics. To illustrate such a difficulty, let us look at an image of ships. Given such a picture, how 
are we to know what type of ships are in the picture. In other words, are the ships destroyers, cruis- 
ers, submarines or passenger ships? As another example, let us suppose that there is a picture of a 
dog and a cat. How do we know if they are chasing each other or playing? 

To answer queries posed on images, for example, a person must draw from a very rich experi- 
ence encountered in life to derive at a good answer. One must have a sophisticated technique to 
analyze the contents of the images to get the semantics of different things in the images. Technol- 
ogy today is not advanced enough to expect systems to have this kind of capability to answer mul- 
timedia query. However, we can use both AI and IR technology to do the next best thing. We can 
abstract the contents of multimedia data into words or text and use the text description equivalent 
of the original multimedia data to match the user request or query. This is the principle we will use 
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Figure 1 : Example for the Multimedia Data Format 

in designing a MDBMS to handle multimedia data for different applications. Figure 1 shows the 
format of a multimedia data which consists of the registration, raw and description data. 

Raw data is the bit string representation of the image, sound, signal, etc. obtained from scan- 
ning or digitizing the original multimedia data. Registration data generally enhances the informa- 
tion about raw data and is not redundant. The contents of a multimedia data is described by 
description data. Description data cannot be automatically derived by the computer given the tech- 
nology today. We assume that users will supply the description data for multimedia data in a natural 
language form. 

3.2 Architecture 

In this section, we present the various components of our MDBMS. This is the modified ver- 
sion of the architecture of a MDBMS discussed in [LUM89]. Our proposed architecture enhances 
the performance of the matcher component and adds the capabilities of the user interface which are 
lacking in the architecture proposed in [LUM89]. 

As shown in Figure 2, the components break down into user interface, query processor, data 
access and intelligent retrieval subsystem. The data access subsystem consists of conventional and 
media manager and controls the access to the actual data stored in relational and media DBMS. The 
intelligent retrieval subsystem is composed of parser, generator, matcher and description manager. 
The query processor accepts queries from users and executes them by calling the other compo- 
nents. When a new description for a multimedia data is entered, for example, the query processor 
calls the parser. The parser uses the dictionary to produce first-order predicates and return them to 
the query processor. The query processor then hands the predicates over to the description manager 
which then links the description to its multimedia data. 

When the query processor receives a query the first task is to decompose the query into sub- 
queries affecting only conventional or media part. The conventional subquery is passed directly to 
the conventional data manager without modifications. For the text description, the query processor 
calls the natural language parser to obtain the equivalent query predicates. The predicates are then 
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handed to the matcher. The matcher tries to match the query with the qualified multimedia data by 
comparing the predicates of the query with that of the stored multimedia data. The matcher does 
this by calling the description manager and using domain knowledge. In addition, if an exact match 
is not possible, the matcher automatically switches to approximate match. To guide the matching 
process, the matcher also gets input from the user. As the solution to the natural language part of a 
query, the query processor receives links to the qualified multimedia data. After combing them with 
the results of the conventional subquery the final results are retrieved by the Data Access Sub- 
system. 

The query processor, conventional and media object manager, description manager, parser and 
matcher have already been implemented as part of the MDBMS prototype system developed at the 
Naval Postgraduate School [MEYE88, LUM89, HOLT90, PEI90]. In this paper, we describe main 
components of the system: the natural language understanding capabilities of the parser, the pro- 
posed approximate matching process in the matcher and the interaction technique of the user inter- 
face. 

4. Natural Language Understanding in the Parser 

In this chapter we describe the natural language understanding capabilities of the parser. We 
outline that in order to accomplish the goal of content retrieval of multimedia data full, understand- 
ing of natural language is not necessary. However, a restricted interpretation is necessary which is 
done by the parser component using the application dependent dictionary as a semantic basis. 
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4.1 Natural Language Description for Multimedia Data 

As mentioned, we propose to perform retrieval of multimedia data by matching the natural lan- 
guage descriptions with the query specifications. We discarded the keyword search technique as a 
viable option because keywords are discrete and lack complex linking mechanisms to adequately 
capture the contents of multimedia data. In addition, it is not always possible to convey exact mean- 
ings using only keywords. In contrast, natural language descriptions allow the description of all 
kind of multimedia data with the additional advantage that everyone is familiar with it resulting in 
high acceptance rate. 

We believe that unrestricted natural language processing is very difficult to achieve given the AI 
technology today. We found that the language needed to describe multimedia data is much more for- 
mal than everyday English. Hence, instead of natural language description, we use captions to de- 
scribe multimedia data. Captions are a natural but special, stylized way of writing descriptions with 
a subset of natural language and not as difficult to parse and interpret as general natural language. 

Additionally, for a particular multimedia application the universe of discourse is usually quite 
constraint. Nouns tend to be concrete and most multimedia databases emphasize still photographs 
and other fixed time graphics to which few verbs can be applied thereby easing a difficult aspect of 
natural language processing. Important is that we use natural language only to access entities in a 
database making complete understanding of all aspects of a word unnecessary. The details of cap- 
tions and their restrictions for our objectives are beyond the scope of this paper and are given in 
[HOLT90, ROWE91]. 

4.2 Dictionary 

Besides the captions themselves, our system requires auxiliary information from a dictionary. 
The dictionary or lexicon is necessary for parsing and gives each possible natural language word 
its semantic: its part of speech, its grammatical form and the form of literals needed to represent it. 
Many of the words - for example, conjunctions and qualifying adjectives - are consistent in mean- 
ing across a wide range of domains; thus we can borrow their interpretation from existing natural 
language systems and include them in every dictionary. The words that significantly change be- 
tween applications are nouns and few verbs, have need to be defined for every application domain 
separately, but mostly their meaning is straightforward. To simplify matching, we are trying to limit 
the properties and relationships to a small set of primitives, for example we will not distinguish 
between the relationship asserted by the terms ‘within’, ‘inside’, ‘part of’, ‘containing’ and ‘com- 
prising’. This can be done without loss because in order to achieve efficient retrieval it is not nec- 
essary to capture the full meaning of an English expression, but just the main intent. 

The dictionary is an important part of the system which is application dependent. In order to 
allow an interpretation of natural language captions it defines the domain of each application thus 
restricting their vocabulary, the semantics and the knowledge of the system to apply all the infor- 
mation. 
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4.3 Natural Language Interpretation 

The parser translates the text description into a set of predicates called meaning list. The im- 
precision and ambiguity of the natural language descriptions is reduced considerably by transform- 
ing them into a set of predicates. These predicates state facts about the real world entities involved 
with multimedia data like their properties and relationships. As in most parsing methods, we chose 
the use of first-order predicate calculus as a formal representation of the description data. The pars- 
er depends on the dictionary to turn the descriptions into predicates. It is the parser’s task to use the 
dictionary to resolve synonyms and to check the syntactic context to resolve lexical ambiguities. 

Our parser also provides mechanisms to automatically partition a user query into the subject, 
verb and object components. This is essential in that, during data retrieval as we will see later, we 
can use the partitioned components to match against domain-dependent knowledge which also 
break down into subject, verb and object categories. Other important features of the parser are the 
use of supercaptions, a generalization of captions, and frames for stereotypical actions, allowing a 
set of predicates to be derived from terms in the description. 

Our current implementation of the parser uses augmented-transition network parsing and inter- 
pretation routines. It is implemented in Quintus Prolog and running on a SUN SPARC workstation. 
The details of the parser and the predicates are beyond the scope of this paper and are given in 
[LUM89, HOLT90, DULL90, ROWE91]. 

An example of natural language description and its translation into an equivalent set of predi- 
cates using the parser is shown below as follows: 

Description: “A car with red body ” 

Predicates: car(x), component(x,y), body(y), color(y,red) 

Choosing the right set of predicates is a very difficult task which is comparable to knowledge 
acquisition for expert systems. For the purposes of this paper, it is sufficient to assume that the dic- 
tionary lists all the words the parser can recognize, all the parts of speech associated with any word, 
and the predicates to use when a word appears in the description. Thus, the set of all predicates that 
can be used in the descriptions must be defined in the dictionary. 

5. Matching 

In this chapter, we propose new ways of matching natural language descriptions of the multi- 
media data with the query specifications. The key to our matching process is the use of the domain 
knowledge represented using the notion of class hierarchy borrowed from the object-oriented field. 
Before we continue, we first discuss some specific problems found in our current matching capa- 
bility that we eluded earlier. This will serve as the motivation behind our new intelligent approach 
to approximate matching. 
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5.1 Problems in Matching 

In our current system [LUM89, HOLT90], the result of parsing is one set of predicates per mul- 
timedia data instance. A query description is also entered in natural language and parsed. The ar- 
guments of the query predicates can be variables. A multimedia data is selected as the result of the 
query, if there exists a binding of query predicates to description predicates of multimedia data. The 
match of user query to multimedia data need not be exact. A set of rules, sometimes domain de- 
pendent, specifies situations in which sets of predicates that look different are really the same thing. 

The matching catches different natural language phrases with the same meaning, but not the 
semantic relationships among the predicates. For example, let us reconsider the description, “a car 
with red body” , of an image multimedia data. The predicates generated are "car(x), compo- 
nent(x,y), body(y), color(y,red)" . For the sake of argument, we consider a query with the descrip- 
tion, "a red car”. The query would be translated into something like “car(x), color(x,red)” . There 
would be no match because the system does not know that the color of a car’s body is identical to 
the color of the car. To overcome this problem, rules can be introduced to express the semantic re- 
lationships among the predicates. In the above case, the rule introduced could be: 

if (car(X), component(X,Y), body(Y), color(Y,Z)) then color(X,Z)\ 

Using the above rule, color(x,red) can be deduced in the example above and there would be a 
match between the query and the description. A key unsolved problem, however, is the question of 
which literals of the predicates to generalize to get a match, and how far to generalize. This falls 
into the category of approximate matching to a user query that we mentioned earlier in the paper. 
We believe that the answer lies in the use of domain-dependent knowledge. 

If we are just interested in exact matching of a user query to the description of a multimedia 
data, our current matching technique [LUM89, HOLT90] would be quite adequate. However, a 
common problem lies in the fact that the user query is likely to result in an empty answer in which 
no exact matching to the description of stored multimedia data occurs. In this case, an efficient sys- 
tem will try to perform approximate matching whereby descriptions of multimedia data that satisfy 
some generalization of the user query are selected. Our objective, then, is to perform approximate 
matching to a user query efficiently. As mentioned earlier, our proposed approximate matching al- 
gorithm makes use of domain-dependent knowledge to meet the objective. 

5.2 Domain-Dependent Knowledge 

Earlier, we justified the use of captions to describe multimedia data by stating that each multi- 
media application restricts the scope of the description of multimedia data. This means that the do- 
main of discourse for the captions are limited for each multimedia application. Domain-dependent 
knowledge are key concepts in the domain of discourse of the captions. For our purposes, we only 
include concepts of nouns and verbs in the domain-dependent knowledge. 

To represent domain-dependent knowledge, we chose the object-oriented data model 
[BANE87, KIM89, ZDON90]. The object-oriented model supports highly structured, complex ob- 



- 11 - 




Figure 3: Generalization Hierarchy of a Plane 



jects and can capture naturally any mini-world entity. The data model has been used widely in such 
areas as CAD/CAM, VLSI, office automation, software engineering and AI. Our justification for 
using the object-oriented model to represent domain-dependent knowledge is as follows: First, it 
supports generalization and specialization abstraction which permits conceptual generalization on 
the contents of the captions. Second, researchers [WOEL87, HOLT90] have identified the use of 
object-oriented model in multimedia database applications as an appropriate and viable option. 

Without loss of generality, we will restrict our domain to the domain of the military history of 
US forces in the Pacific during World War 2. The main reason is that we tested our current proto- 
type MDBMS in military application based on the domain of the US military history. For our pur- 
poses, we will apply our approximate matching technique to the domain of military history. 
However, we claim that our approximate matching technique can be applied to other multimedia 
applications. 

Figure 3 shows an example of the generalization hierarchy of a plane, a noun concept in our 
domain of discourse. It is the domain-dependent knowledge on planes that participated in the Pa- 
cific during World War 2. We assume that the reader is familiar with object-oriented concepts such 
as object, class, inheritance along class hierarchy or lattice and methods. We also assume that the 
direction of the arrow in Figure 3 is from a class to its subclass. In Figure 3, the Plane class is spe- 
cialized into classes Transport, Fighter, Bomber and Seaplane. Class Transport is specialized into 
class C-47 and class Fighter is specialized into classes F6F-Hellcat, Corsair and Zero. In addition, 
class Bomber is specialized into class B-25 and class Divebomber which is further specialized into 
classes Zero, Dauntless and Stuka. 

The generalization hierarchy of a plane is a class lattice since class Zero has two superclasses, 
namely class Fighter and class Divebomber. In addition, properties of superclasses are inherited by 
all their subclasses along the superclass/subclass hierarchy but not vice versa. 
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Figure 3 is one example of a domain-dependent knowledge corresponding to a noun (i.e. plane) 
concept in the domain of discourse. For our purposes, we can have domain-dependent knowledge 
for all noun and verb concepts in our domain of discourse. It is obvious that some of the noun and 
verb concepts may belong to the same class or generalization hierarchy. Hence, generalization hi- 
erarchy need not be created for each and every noun or verb concept. 

5.3 Partial Matching Algorithm 

In this section, we will discuss our partial matching algorithm. For clarity, we will devise our 
partial matching algorithm by following through an example. Unless explicitly stated, we will refer 
to the example generalization hierarchy given in Figure 3. Before we go on, we next discuss what 
it is that we are interested in doing. 

Suppose that we have images of planes stored in the multimedia database and the images are 
described as transport planes. Let us now assume that a user gives a query asking for all planes 
which are C-47s. Even though there are no exact matching, we should retrieve all transport planes 
stored because any C-47 is a transport plane according to the domain-dependent knowledge. Now, 
if the user asks for all fighter planes, we cannot simply retrieve all transport planes because they 
may not be what the user wants. However, a user asking for planes would more likely retrieve the 
stored transport planes than if he was to ask for fighter planes because a transport plane is still a 
plane but is not a fighter plane. 

The goal of our algorithm is also to minimize the influence of the definition of the hierarchy 
which is dependent on the designer. The generalization hierarchy designer might have a view of 
the domain dependent knowledge which may not be consistent with the view of other people. This 
phenomenon might bias some specific branch of the generalization hierarchy over other branches 
during partial matching. 

An efficient partial matching algorithm has to deal with all the problems such as the ones ad- 
dressed above and come up with a general solution. We solve these problems by using heuristics 
to assign a weight ranking system given a generalization hierarchy(ies). Our major objective is to 
come up with a weight ranking scheme that is both fair and accurate which can be used to deter- 
mine whether stored multimedia data should be retrieved given a user description. 

5.3.1 Weight Ranking Scheme within a Generalization Hierarchy 

In this section, we will discuss the weight ranking strategy used by our partial matching algo- 
rithm given a single generalization hierarchy. The weight ranking strategy used for a group of gen- 
eralization hierarchies will be discussed in the subsequent section. The weight ranking strategy 
used on a generalization hierarchy is a consequence of the semantics of the class hierarchy (lattice) 
or the IS-A hierarchy concept supported in an object-oriented data model. 

Given a class C in a generalization (class) hierarchy for a noun or a verb concept, and assuming 
that a class, other than C, with a rank of positive weight is a specialization of class C while one 
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with a rank of negative weight is a generalization of C, we can introduce the following two general 
heuristics. 

Heuristic 1: All direct (indirect) subclasses ofC have positive weights. 

Heuristic 2: All direct (indirect) superclasses ofC have negative weights. 

Heuristic 1 says that given a class C specified in a user query, all subclasses of C in the class 
hierarchy to which C belongs are specializations of the class and more weights (positive) are given. 
Heuristics 2 says that given a class C specified in a user query, all superclasses of C in the class 
hierarchy to which C belongs are generalization of the class and less weights (negative) are given. 
This reasoning follows directly from the definition of a class (IS-A) hierarchy and relationships 
among classes along the class hierarchy in the context of an object-oriented data model. 

The assignment of negative weights to generalization is intuitively clear. The assignment of 
positive weights to specialization is based on the fact that specialization inherits all properties of 
the parent nodes in addition to having its own additional information. Hence, we feel that positive 
or more weights should be assigned to the nodes in the paths towards specialization hierarchy. 

Given the heuristics, it is easy to see that all classes in the class hierarchy which have ranks of 
positive weights relative to the class C, which is specified in the user query as either a noun or a 
verb concept, are selected during approximate matching. This is because all classes with ranks of 
positive weights are subclasses (specialized classes) of the class C, specified by the user query. 
Since each of the classes is a specialized version of class C, it encompasses properties of class C 
and indeed is class C. 

On the other hand, all classes in the class hierarchy which have ranks of negative weights rel- 
ative to the class C which is specified by the user query should be restrictively selected depending 
on the weights. This is because all classes with ranks of negative weights are superclasses (gener- 
alized classes) of the specified class C along the class hierarchy. Since each of the classes is a gen- 
eralized version of class C, it does not encompass all properties of class C and is not class C. The 
question of which classes to select depends on getting information from the user on how far to gen- 
eralize. 

The weight ranking system we introduced so far is vague and is not well defined. What is de- 
fined is that given a class C in a class hierarchy of interest, any class belonging to the same class 
hierarchy which is assigned a positive weight is always selected during approximate matching. On 
the other hand, a class in the same class hierarchy which is assigned a negative weight is only se- 
lected during approximate matching if it exceeds a threshold given by the user. We now discuss the 
assignment of weights for different classes in the class hierarchy of interest. 

There are three different situations in which weights can be assigned to classes in a class hier- 
archy. The different situations are shown in Figure 4. Suppose that the class specified in a user que- 
ry is class C. As before, we assume that the direction of the arrow is from a class to its subclass. 
For example, in Figure4 (a), class C is a superclass of class X and class X is a subclass of class C. 
The first situation, shown in Figure 4(a), is to assign weight to a class (X or Y) which is a subclass 
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Figure 4: Three Situations in a Class Hierarchy 

of class C. The second situation, shown in Figure 4(b), is to assign weight to a class (X or Y) which 
is a superclass of class C. The third situation, shown in Figure 4(c), is to assign weight to a class 
(Y) which is a subclass of a superclass (i.e. X) of class C. 

The principles behind our weight ranking system are quite simple. We assume that all classes 
with positive weights and some classes with negative weights that exceed a threshold value are se- 
lected during approximate matching. First, we assign a weight of 0 to the class C specified in the 
user query. Class C is the reference point to all other classes in the class hierarchy during approx- 
imate matching. For classes which are subclasses of class C, we assign positive weights because 
they are specialized version of class C. Specialized versions of class C have more specific and def- 
inite information than C itself and hence are assigned positive weights instead of 0. For our pur- 
poses, all subclasses of C are assigned the same positive weight. 

For classes which are superclasses or subclasses of superclasses of class C, we assign negative 
weights because they are generalized version of class C. Generalized versions of class C have less 
and more general information than C itself and hence are assigned negative weights. Different gen- 
eralization versions have different negative weights. However, in assigning negative weights, we 
have to minimize the influence of the definition of the model. It is true that the further away a class 
is from class C in the class hierarchy, the more negative weight is assigned to the class. 

In most systems, the assignment of weight of a class is linearly inverse proportional to the depth 
level of the class relative to the level of the class C specified in the user query. We believe that this 
is not the correct approach because the relative distance of a particular class to the class of interest, 
in this case class C, with respect to other classes is not the absolute but some artificial distance 
caused by a particular designer’s view of the domain knowledge. The main problem with this ap- 
proach is that some classes belonging to some lengthy branch could be unfairly disqualified be- 
cause of higher negative weights. Our weight ranking system tries to minimize the bias against 
some lengthy branch of a class hierarchy over other shorter branches. 

Given that class C is the class specified by user query as shown in Figure 4, the assignment for- 
mulas of weights for classes in a class hierarchy according to the three different situations men- 
tioned are as follows. 

(1) Class specified by user query (i.e. class C in Figure 4) 




weight = 0 
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(2) Subclass of class C (i.e. class X or Y in Figure 4(a)) 
weight = a , where a is a integer constant 



(3) Superclass of class C (i.e. class X or Y in Figure 4(b)) 



weight = - 





, where a, p are integer constants and n is level # of super- 
class relative to class C 



(4) Subclass of a superclass of class C (i.e. class Y in Figure 4(c)) 



/ * , iw 1 , (i+l-jK 

weight = - ax£(p) - Y><X(^) ) , 

where a,p,D,Y are integer constants; h is level # of superclass relative to class C and / is 
level # of subclass relative to superclass 

In our scheme, a class which is assigned a positive weight is always selected during partial 
matching. A class with a negative weight can be selected provided that it does not exceed a thresh- 
old value set by the user. To understand the weight assignments for different classes, we next give 
some examples using the class hierarchy of Figure 3. Given a user query, if the image correspond- 
ing to the user description is not found in the database, the system then automatically proceeds with 
approximate matching. Using the weight assignment formulas and given some user query descrip- 
tions, the weights for some of the classes in the class hierarchy are as follows. For the sake of ar- 
gument, we assume that the values of a, P, y and x> are 40, 2, 48 and 2 respectively. 

(1) “A transport plane sank in the Pacific” 

Transport = 0, C-47 = 40, Plane = -20, Fighter = -44, Corsair = -56 

(2) “A F6F-Hellcat sank in the Pacific” 

F6F-Hellcat = 0, Plane = -30, Seaplane = -54, Stuka = -72, C-47 = -66 

(3) “A bomber sank in the Pacific” 

Bomber = 0, Stuka = 40, B-52 = 40, Plane = -20, Seaplane = -44, C-47 = -56 

In the examples shown, all classes which are assigned positive weights are selected during par- 
tial matching. In example (1), the class C-47 has a positive weight of 40. This means that the image 
whose description is “A C-47 sank in the Pacific” is selected during partial matching. As shown 
in the examples, all classes which are subclasses of the class which is specified in a user query are 
assigned positive weights. All classes which are superclasses or subclasses of the superclasses of 
the class which is specified in a user query are assigned negative weights. For these classes, the 
weight of a class is inversely proportional to the depth level of the class relative to the level of the 
class specified in the user query along the class hierarchy although they are not strictly linear. In 
example (2), the class Seaplane has a negative weight of -54. This means that the image whose de- 
scription is “A Seaplane sank in the Pacific” has a weight of - 54. Class Stuka has a negative 
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weight of -72 and class Stuka is further away from F6F-Hellcat than class Seaplane is from F6F- 
Hellcat. 

Suppose the weight of a class is linearly but inversely proportional to the depth level of the 
class relative to the level of the class C specified in the user query. If we assign a negative constant 
weight, say -10, for each level away from class C, the class which is 5 levels away from class C 
will have a negative weight of -50 compared to a negative value of -20 for a class which is 2 levels 
away from class C. For example, if the user query is “A transport sank in the Pacific”, the weight 
of class Transport is 0, class Seaplane is -20 and class Stuka is -40. Using our formulas, the same 
user query will assign weights of classes Transport, Seaplane and Stuka to be 0, -44 and -62 respec- 
tively. The weight of class Stuka is more biased against relative to the weight of class Seaplane us- 
ing the linear method over our method. 

It is very difficult to quantify how much closer class Seaplane is to class Transport over class 
Stuka to class Transport as both Seaplane and Stuka are types of planes. Our formulas are designed 
to minimize bias as best as possible. A user is more likely select a threshold value such that class 
Stuka is less likely selected during approximate matching over class Seaplane using the linear 
method compared to using our dynamic method. Another difficult task is to set the value of the con- 
stants to be applied in our assignment formulas as well as the threshold value. The user must choose 
the correct values for the constants and the threshold value depending on the number of objects that 
qualify during approximate matching. Hence, it is necessary for the system to interact with the user 
through the user interface throughout the matching process. 

5.3.2 Weight Ranking Scheme for a Group of Generalization Hierarchies 

In the previous section, we discussed the ranking of weights for classes belonging to the same 
generalization hierarchy. In this section, we extend the ranking of weights for classes belonging to 
different generalization hierarchies. In our scheme, the ranking of weights for classes in different 
class hierarchies is influenced by the following rules. 

Rule 1: For each local class hierarchy, the weight ranking system discussed in Section 4.3.1 
is applied. 

Rule 2: For different class hierarchies, the user determines the priority order of class hierar- 
chies. 

Using rule 1, for each class hierarchy selected by the user query, the classes within the class 
hierarchy are assigned weights using the weight ranking system discussed in Section 4.3.1 and is 
a straightforward process. Hence, regardless of the number of class hierarchies involved, all classes 
belonging to these hierarchies can be assigned weights for partial matching. It is easy to see that 
rule 1 does not cause any problems because there is no interrelationship between classes of differ- 
ent class hierarchies during weight assignments. 

The global ranking of weights for different class hierarchies is a problem because the weights 
assigned within each class hierarchy now has to be considered with respect to weights assigned for 
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Figure 5: Combination of two Class Hierarchies 

other class hierarchies and they have to be meaningful globally. Rule 2 is to determine the priority 
order of importance of the class hierarchies selected from a user through the user interface. Differ- 
ent class hierarchies can be assigned different weights according to the priority order of impor- 
tance. 

Figure 5 shows a combinations of classes belonging to two different class hierarchies. For our 
purposes, we now consider a user query description “C and D” involving classes C and D belong- 
ing to different class hierarchies in Figure 5. There are three generic partial matching combination 
types and they are given as follows. The example classes in the combination types are taken from 
Figure 5. CHI is the name of the class hierarchy on the left and CH2 is the name of the class hier- 
archy on the right in Figure 5. 

Type 1: C of CHI, and any class in CH2 except D. 

Type 2: Any class in CHI except C, and D of CH2. 

Type 3: Any class in CHI except C, and any class in CH2 except D 

The ranking of weights for type 1 and type 2 combinations are easy to handle by using the pre- 
viously discussed weight ranking system. This is because we only need to assign weights to classes 
in one of the class hierarchies but not both. However, handling type 3 combination requires a closer 
attention because it requires assigning weights to classes belonging to different class hierarchies. 
To assign weights in this case, we determine the priority order of CHI and CH2 through feedback 
from the user. Through the user interface, we get information on which class hierarchy has a higher 
priority. We then assign different weights for CHI and CH2 depending on the priority order. 

This can be expressed using the following weight formula. The constant values of a and p has 
to be determined by the user through the user interface. 

(1) Weight (Type 1) = Weight (CH2) 

(2) Weight (Type 2) = Weight (CHI) 

(3) Weight (Type 3) = a (Weight (CHI)) + p (Weight (CH2)) 

Figure 6 is a generalization (class) hierarchy of the noun Place concept. Using Figure 3 and Fig- 
ure 6, given a user query description “A C-47 sank in the Ocean”, an example of a type 1 combi- 
nation is a multimedia data with a description “A C-47 landed in an Island”. A type 2 combination 
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Figure 6: Generalization Hierarchy of a Place 



is a multimedia data with a description of “A Stuka sank in the Ocean”. Finally, a type 3 combi- 
nation is a multimedia data with a description “A Transport landed in Sea”. 

In this section, we discussed the assignment of weights for classes involving two different class 
hierarchies. For a practical system, the number of class hierarchies involved for weight assignment 
is obviously large since many noun and verb concepts are involved. It is not difficult to see that our 
weight ranking scheme discussed in this section can be easily extended to assign weights for class- 
es involving many class hierarchies. The main problem lies in how good the user interface is in 
getting the information from the user. Obviously, the weight ranking system has to be dynamic, 
since all constant values assigned by the user can change depending on the number of qualified 
multimedia data selected during partial matching. The user also has to determine the threshold val- 
ue such that not too many multimedia data are selected from the database. 

5.3.3 Application of Weighting Algorithm 

The application of the weighting algorithm just presented requires a parser to understand the 
natural language specifications in the multimedia data descriptions and the user queries. As stated 
earlier, the descriptions are parsed and stored in the system as predicates. The queries are processed 
as follows. 

When a query is received from the user, the parser separates the natural language specification 
into smaller component groups, namely subject noun, verb and object noun phrases. Each of these 
will actually become predicates. When these predicates match exactly with the predicates in the 
descriptions of certain multimedia data, those multimedia data will be retrieved. However, there 
may be other descriptions of multimedia data that are actually of interest to users but those descrip- 
tions are not stated as logically implied by the query. This latter category is expected to be the usual 
case rather than the former for reasons stated earlier. 

To find the latter, we suggest that system search in the noun and verb generalization hierarchies 
of the object classes and assign weights to the descriptions as given in the weight assignment algo- 
rithm, assigning the appropriate weighting factors (to and 6 in the previous section) as received 
from the user. These multimedia data with combined weight exceeding the threshold value set by 
the user will then be retrieved. 
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The separation of the natural language query can be in smaller components than the three groups 
just stated. For example, a complex noun phrase may be separated into a number of small noun 
groups and the weighting algorithm applied to these groups to obtain a combined weight. For ex- 
ample, ''''the man with a mustache" can become two classes, namely man and mustache. Naturally, 
the finer the granularity of the separation, the larger and the more complex the processing is needed. 

6. Graphical User Interface 

The goal of a graphical user interface is to support the query specification process allowing the 
user to efficiently use the database system. It should allow inexperienced users to retrieve data from 
the database without having to know a specific query language. In today’s database management 
systems the user is forced to think in terms of data model and query language, differing a lot from 
his way of thinking. Often a user can express a query easily in natural language, but has difficulties 
to express it in some given query language. 

Most queries involve both media and formatted data. For the media part of the query we use 
our intelligent matching algorithm which is directly processing natural language captions. For con- 
ditions on formatted data, natural language expressions are mostly too imprecise to be directly pro- 
cessed. We try to overcome this problem by providing a graphical user interface supporting a 
natural query specification. 

The data model adopted in our system is an extended relational data model. Despite some draw- 
backs the relational model has great advantages: It is well known, widely used and has a firm the- 
oretical basis. For our purpose, we extend the relational model to capture media data types and, as 
shown below, we also extend the query language to allow the manipulation of media data and fa- 
cilitate the query specification process. 

Before describing the user interface of the MDBMS system, we first outline ways to achieve a 
natural query specification process. 

6.1 Towards a natural query specification 

Usually, every user can describe a query (or at least the desired result) easily in natural lan- 
guage. Unfortunately, natural language expressions representing a query are imprecise and difficult 
to automatically translate into a formal query language to be understood by a database management 
system. We argue that the gap between the user’s way of expressing a query in natural language 
and database manipulation languages like SQL can be improved considerably. 

When comparing the user’s natural language (NL) expression for a query with corresponding 
SQL statements the first difficulty is that the table and attribute names do not exactly match. In a 
graphical user interface this problem is easy to overcome. All table and attribute names can be pre- 
sented to the user who simply selects the desired ones using a pointing device (e.g. mouse). 

Another difficulty is related to joins between tables. Mostly the join condition is hidden in the 
user’s NL expression. In examining a large number of queries expressed in natural language as well 
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as SQL we found that, in most cases, the join condition directly corresponds to some specific NL 
expressions. Additionally, the number of joins used in most of the queries was small compared to 
the number of possible joins. This can be explained by two facts. First, the number of semantically 
meaningful joins is restricted and second, some of the most frequently used joins are already in- 
tended at the design time of the database. In order to provide a natural way of expressing joins, in 
our system we allow database designer and user to define and name joins prior to its actual use. A 
predefined join can involve more than two tables (e.g. two tables are joined by means of a third 
table) thereby providing a simple way of expressing m:n relationships. Once defined and named, 
all predefined joins can be used to specify a query. Predefined joins differ from views: First, the 
result of a predefined join is not a table as in the case of a view but a specific connection between 
tables. Second, predefined joins allow connections between different levels in nested queries and 
even recursive joins can be expressed. An example using predefined joins is given in the next sec- 
tion. 



Another thing we learned in examining the process of query specification is the handling of 
complex queries. Given a complex data retrieval task the user is partitioning it into smaller subtasks 
which are easier to handle. Starting with the clear parts of the query the user deals with all parts 
and combines the results into the final solution. In our system we support this way of handling com- 
plex queries by an incremental query specification to be described in the next section. 

Finally, we observed that a special category of queries is easy to express in NL but rather com- 
plicated in a formal query language. Additional operators, closely related to corresponding NL ex- 
pressions, allow an easier and clearer query specification. Considering for example a query like 
‘Select the name of planes which can carry all weapons of the category air-to-air’ we found that a 
special ‘all’ operator greatly enhances the readability and understandability of the SQL-like query 
making it similar to the user’s NL expression. For the example, we presume to have the tables 
plane, weapon, plane weapon and a predefined join named carries expressing the m:n relationship 
between planes and weapons. 

select p_name from plane 

where plane carries weapon 

and w_nr = all (select w_nr from weapon 
where category = ‘air-to-air’) 

A SQL statement expressing the same query without the all operator is rather complicated. Two 
possibilities are: 



select p_name from plane 
where ((select w_nr from plane_weapon A 
where plane. w_nr = A.w_nr) 
contains 

(select w_nr from weapon 
where category = ‘air-to-air’) ) 



select p_name from plane 
where not exists 

(select * from plane_weapon B 

where B.w_nr in (select w_nr from weapon 

where category = ‘air-to-air’)) 
and not exists 

(select * from plane_weapon C 
where C. p_nr = B.p_nr 

and C.w_nr = B.W_nr) ) 
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6.2 Description of the Graphical User Interface 

In this paper, we will give a general idea of our graphical user interface by presenting a small 
example of the retrieval process. Due to space limitations we will only describe a small fraction of 
its capabilities. 

After selecting the database the user gets the system menu providing the main database manip- 
ulation functions: insert, delete, update or retrieve. When selecting retrieval, the user gets the query 
specification window and his first step is to select the tables to be used in the query. For each se- 
lected table a list with all attributes will be displayed in a separate window and all predefined con- 
nections involving at least one of the selected tables will appear in the Connections window. To 
specify the result list (projection) the user has to move the desired attributes to the Result List. Now 
only the conditions needs to be specified. Using connections, attributes of the selected tables and 
operators provided by the Tool Box the query can easily be built using the mouse. In the Query Rep- 
resentation window the query is displayed graphically. Each part of the query is represented by a 
small box, simple conditions by a single, subqueries by a double box, and the connection lines are 
labeled with the kind of connection used. An advantage is that every part of the query can be ad- 
dressed for edit or delete at any time during the query specification process. To enhance the clarity 
of display parts of the query can be grouped together and displayed as one box (zoom in). If the 
user wants to see the query in full detail at a later stage he can use the zoom out option. 

To support incremental query specification we allow the user to start with any part of the query 
and combine the separate parts at a later stage. Additionally, we provide an option to save and re- 
load any part of the query for later use. 

Another important part is the way of specifying the natural language description part of a query 
necessary when media data are involved. If the user selects a media attribute in the specification of 
the condition, automatically a special description editor will be displayed in a separate window 
where the media description can be specified. The description editor has special features including 
buttons to check the description, present the hierarchy for a word and enter the weight of the dif- 
ferent parts of the description needed for the approximate matching. 

To further explain the query specification process we will consider the following example: 
‘Select the name, air base and image of planes which can carry all weapons of the category 
air-to-air and where the image shows the plane attacking a hostile plane’ . 

If a user wants to specify the query he might want to start with an easy part, e.g. ‘ weapons of 
the category air-to-air’ . To specify this part the user first selects Subquery in the Tool Box provid- 
ing him a second double box for his subquery. Then he selects weapon in the Tables window. As a 
result he gets all attributes of the weapon’s table in a separate window and by clicking to w_nr he 
selects the desired attribute. The next step is to specify the condition. By clicking to Cond in the 
Tool Box he gets an empty condition box in the Query Representation window and by clicking to 
the attribute category in the weapon’s window, '=’ in the Tool Box and typing in air-to-air he fills 
the box with the actual condition. 
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Figure 7: Description Editor 



As the next part the user might want to specify the image description condition 'image shows 
the plane attacking a hostile plane'. The specification process for this part is similar to the speci- 
fication of the first part. The user selects the plane table and after getting a new condition box he 
selects the attribute image from the plane window. Because image is a media table, the system au- 
tomatically provides the special Description Editor window (see Figure 7). In this window the user 
can type the natural language description for the image, in our example ‘Plane attacks a hostile 
plane'. When clicking the Done button the description will direcdy be interpreted by the parser to 
get the equivalent predicates. 

The last step is to specify the main part of the query and to compose the parts into the final re- 
sult. Starting with the beginning of the query (‘ Select name, air base and image’) the user moves 
the attributes pjiame , air_base and image to the Result List window. By selecting Cond from Tool 
Box and plane carry weapon from the connections window the user specifies the join condition. 
Now as the last part of the query the user has to specify the all condition. This can be accomplished 
by getting a new condition box, clicking to w_nr in the weapons window, '=’ and ‘all' in the Tool 
Box and the double box representing the subquery ‘weapons of the category air-to-air’ in the Que- 
ry Representation window. The last step is to combine the conditions into the final result. This is 
done by selecting the conditions and the logical operator AND from the Tool Box. In Figure 8 the 
final result of the query specification process is shown. 

To represent the results we choose a combined form and list oriented approach. Generally, the 
results are presented as a list. Media attributes are represented as buttons allowing to access the 
media data. By clicking to a row of the list a single tuple can be obtained in a form. Figure 9 shows 
the results of our example and the representation of one tuple in a customized form. 

In this paper, we presented only a small part of our graphical user interface. The data definition, 
insert, update and deletion operation, query processing and optimization issues, predefined joins, 
special operators and their semantics are far beyond the scope of this paper and will be presented 
in a later paper [KEIM91]. 
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7. Summary 

A major problem faced in a multimedia database system is the retrieval of multimedia data such 
as a sound or an image. Media data is intrinsically rich in semantics and conventional search meth- 
ods used in databases and information retrieval systems may not work or are of little use. Most re- 
search on intelligent IR systems are concerned with natural language processing and deductive 
capabilities based on extended semantic model of document content and also from the user. How- 
ever, most of them deal with exact matching or primitive partial matching using simple linear meth- 
ods. Another problem faced in today’s database systems is the lack of a natural way to specify 
complex queries. It is caused by the gap between the user’s way of thinking and the query languag- 
es used in most systems. Although a lot of work has been done in the area of user interfaces for 
database systems no query language comes close to the natural query specification process used by 
humans. 

In this paper, we discussed these fundamental problems and outlined the architecture of our 
MDBMS system. One contribution of our paper is the formulation of a partial matching algorithm 
that uses domain knowledge, represented using an object-oriented data model, and weight ranking 
system to assign weights to different multimedia data stored in a database and selects those multi- 
media data that partially matches a given user query description. Our parser, unlike others, provides 
an interpretation of natural language descriptions needed to achieve an intelligent retrieval of mul- 
timedia data. Additionally, it provides a mechanisms to automatically partition a user query into 
the subject, verb and object components. This is essential in that, during data retrieval, we used the 
partitioned components to match against generalization hierarchies of domain-dependent knowl- 
edge which also deals with subject, noun and object categories. Further research is necessary to 
improve the parser to also automatically derive adjectives and other caption components for com- 
plete understanding and processing of captions in the context of partial matching. 

A second contribution of this paper is our graphical user interface. It shortens the gap between 
the user’s way of thinking and formal query languages by using graphical user interaction. In our 
system, we support an incremental query specification, predefined joins and special operators to 
make the query specification process user friendly. The user is guided as much as possible allowing 
a quick and almost faultless query specification. Further research is necessary to come even closer 
to the user’s way of query specification e.g. by allowing the user to directly communicate with the 
system in natural language. 

We believe that our system provides a simple and elegant approach to both retrieval of multi- 
media data and query specification. The simplicity of our retrieval method lies in exploiting the se- 
mantics of generalization and specialization abstraction of the object-oriented model; the 
simplicity of the user interface lies in the natural way of query specification being directly obtained 
from queries expressed in natural language. We also believe that our approaches are general ones 
that can be readily applied to other areas. Our retrieval method can be used for other applications 
in IR and AI and the ideas of our user interface can be applied to most database query interfaces. 
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Figure 8: Screen after Specifying the Query 




Figure 9: Screen with the Results of the Query 
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